-
Notifications
You must be signed in to change notification settings - Fork 871
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OMPI/OSC/UCX: fix issue in impl of MPI_Win_create_dynamic/MPI_Win_attach/MPI_Win_detach #5094
Conversation
…ach/MPI_Win_detach Signed-off-by: Xin Zhao <xinz@mellanox.com>
test code: aint.c (from MPICH test suite) #include <stdio.h>
#include <mpi.h>
int main(int argc, char **argv)
{
int rank, nproc;`
int errs = 0;
int array[1024];
int val = 0;
int target_rank;
MPI_Aint bases[2];
MPI_Aint disp, offset;
MPI_Win win;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &nproc);
if (rank == 0 && nproc != 2) {
printf("Must run with 2 ranks\n");
}
/* Get the base address in the middle of the array */
if (rank == 0) {
target_rank = 1;
array[0] = 1234;
MPI_Get_address(&array[512], &bases[0]);
} else if (rank == 1) {
target_rank = 0;
array[1023] = 1234;
MPI_Get_address(&array[512], &bases[1]);
}
/* Exchange bases */
MPI_Allgather(MPI_IN_PLACE, 0, MPI_DATATYPE_NULL, bases, 1, MPI_AINT, MPI_COMM_WORLD);
MPI_Win_create_dynamic(MPI_INFO_NULL, MPI_COMM_WORLD, &win);
MPI_Win_attach(win, array, sizeof(int) * 1024);
/* Do MPI_Aint addressing arithmetic */
if (rank == 0) {
disp = sizeof(int) * 511;
offset = MPI_Aint_add(bases[1], disp); /* offset points to array[1023] */
} else if (rank == 1) {
disp = sizeof(int) * 512;
offset = MPI_Aint_diff(bases[0], disp); /* offset points to array[0] */
}
/* Get val and verify it */
MPI_Win_fence(MPI_MODE_NOPRECEDE, win);
MPI_Get(&val, 1, MPI_INT, target_rank, offset, 1, MPI_INT, win);
MPI_Win_fence(MPI_MODE_NOSUCCEED, win);
if (val != 1234) {
errs++;
printf("%d -- Got %d, expected 1234\n", rank, val);
}
MPI_Win_detach(win, array);
MPI_Win_free(&win);
MPI_Finalize();
return 0;
} |
fix issue related to #5083 |
@gpaulsen could you review this PR? This is based on ompi/master, after review is done, I will create a PR for ompi/v3.1.x |
@@ -643,11 +670,116 @@ static int component_select(struct ompi_win_t *win, void **base, size_t size, in | |||
return ret; | |||
} | |||
|
|||
int ompi_osc_find_attached_region_position(ompi_osc_dynamic_win_info_t *dynamic_wins, | |||
int min_index, int max_index, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to make this ompi_osc_ucx_find_attached_region_position
From what I can tell this will work but it will be incredibly inefficient. Yet another reason why I recommend a btl. To make this better you will have to re-implement the optimizations already present in osc/rdma :(. |
That said, this fixes a bug and is sufficient for v3.0.x and v3.1.x. |
I'm working to build / test this now. |
@xinzhao3 Have you tried it with this test code? because it segfaults for me still - 3.1.0rc5 + 5094.diff:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the segfault on ppc64le, and all of my previous osc/ucx failures passed on master.
hm. I'll try again tomorrow again on a clean installation.. |
@xinzhao3 @gpaulsen @jladd-mlnx I confirm that I still get a segfault with the patch on our FDR ConnectX-3 system. I do not get a segfault on our ConnectX-4 installation. If you compare the original report #5083, the crash is different. In particular, the crash is now in
|
@angainor which test code it is? |
@xinzhao3 The one you pasted here (the same as I used in the original post) |
I cherry-picked the fix to the v3.1.x branch, and retested, all looks good. |
@angainor, @xinzhao3, @bwbarrett - In our Open MPI Web-Ex today, we decided to turn the priority of this (OSC UCX) component down to 0 default for the v3.1.x release that we're trying to get out the door. @xinzhao3 will make that PR separately. |
Do we have root cause on why this is causing a different failure on older hardware? Even with a query value set to 0, there's an awful lot of code still being executed in this path, some of which might be the buggy code causing the segfault. Without understanding that segfault, I think we need to consider not shipping the component in 3.1.0. |
@xinzhao3 @jladd-mlnx Can you reproduce my problems? Or might this be my system configuration related to one-sided comm, and not a general ConnectX-3 issue? At least it seems that this is related to UCX itself, and not to this particular PR. In the tests below I've removed the 5094 patch. Running on ConnectX-3, and using hpcx-2.1 release, so at least UCX is up to date. Here I'm trying to run a simple OpenSHMEM program with ucx spml, and get the same error:
If I change the transport from default
I'm compiling OpenMPI 3.0.1 now to double-check. |
@xinzhao3 @jladd-mlnx Well, same thing happens with home-compiled 3.0.1, and with the precompiled version shipped with hpcx 2.1. The OpenSHMEM code starts fine if I use |
@angainor what MOFED version are you running on your CX-3 system? It looks like HCA based atomics aren't working on that system. |
The official statement is that UCX requires extended atomics to support OpenSHMEM semantics, which are not present on the ConnectX-3 device. |
@jladd-mlnx Fair enough; no support on older devices is a perfectly valid vendor decision. But it still probably shouldn't abort in the presence of an older device. |
How did we switch from MPI onesided to shmem in this pull request discussion? Let's stick to the onesided issue; if there are shmem issues, please file a bug report and we'll track those. Back to the onesided fix; @jladd-mlnx and @xinzhao3, how comfortable are you with the crash on CX-3 not being a problem if we set priority to 0 instead of disabling the component entirely? Like to close on that point before we merge the PR. |
@bwbarrett @jladd-mlnx Sorry for mentioning OpenSHMEM, this was only to show that the error I'm getting comes from UCX, and is not OpenMPI specific. But @jsquyres is right in that the code in this example shouldn't segfault when executed without any
|
@bwbarrett we are 100% certain that the crash on CX3 will not be hit if the priority is set to 0. |
@xinzhao3 I really hate to be saying this, but something is still not right. Most of the time the above test code finishes, but once in a while (1/5?) I still do get a segfault running "default" (
|
Very odd. Can't you rebuild with --enable-debug. Might give us a more useful backtrace. |
Before that can you run with |
What might also be interesting to try is |
@hjelmn It was enough to run
Same segfault as above, just without any mention of |
@hjelmn This time I configured with But the |
Interesting. All the line numbers are garbage. Is there a core file you can look at with gdb? |
@hjelmn But maybe I should first get a debug build - that one clearly didn't work. Doesn't |
I can't remember the priority. I thought --enable-debug would override any enable_debug=no in a platform file. I would use the debug platform file and see if that gives an actual debug build. You can check by running ompi_info after its installed. That will indicate whether a build has --enable-debug or not. configure also prints out a line at the end when debug is enabled. |
@angainor get rid of the |
@hjelmn This is the debug stack:
|
Weird. I don't know how the mtl check could be crashing except for memory corruption. Can you run with --mca mtl_base_verbose 100? |
|
Huh, interesting. I wonder if mxm is causing the issue. Try --mca mtl ^mxm |
This is the sequence of events from what I can tell. It is unrelated to this PR. At some point Intel added a check to osc/rdma to disable its use with OPA because verbs over OPA is not recommended. The problem is here we have this sequence:
*&^$#)($ |
@hjelmn Running in a So yes, not related to this PR. |
Ok, so the problem @angainor reported is unrelated to this PR. PR incoming to fix that problem. |
See #5136 |
@bwbarrett @xinzhao3 This PR together with #5135 fixes the |
@xinzhao3 Of course, the segfault is still there if I force |
Great news! |
@jladd-mlnx Yeah, this puts v3.1.0 on track for release and gives time to identify the remaining SEGV for v3.1.1. |
@xinzhao3 please cherry-pick into 3.1.x branch. |
The original impl for MPI_Win_create_dynamic is not correct. This patch redo this implementation.
Signed-off-by: Xin Zhao xinz@mellanox.com